AAAI.2022 - Domain(s) Of Application | Cool Papers

#1 Can Machines Read Coding Manuals Yet? – A Benchmark for Building Better Language Models for Code Understanding [PDF] [Copy] [Kimi]

Authors: Ibrahim Abdelaziz ; Julian Dolby ; Jamie McCusker ; Kavitha Srinivas

Code understanding is an increasingly important application of Artificial Intelligence. A fundamental aspect of understanding code is understanding text about code, e.g., documentation and forum discussions. Pre-trained language models (e.g., BERT) are a popular approach for various NLP tasks, and there are now a variety of benchmarks, such as GLUE, to help improve the development of such models for natural language understanding. However, little is known about how well such models work on textual artifacts about code, and we are unaware of any systematic set of downstream tasks for such an evaluation. In this paper, we derive a set of benchmarks (BLANCA - Benchmarks for LANguage models on Coding Artifacts) that assess code understanding based on tasks such as predicting the best answer to a question in a forum post, finding related forum posts, or predicting classes related in a hierarchy from class documentation. We evaluate performance of current state-of-the-art language models on these tasks and show that there is significant improvement on each task from fine tuning. We also show that multi-task training over BLANCA tasks help build better language models for code understanding.

#2 No Task Left Behind: Multi-Task Learning of Knowledge Tracing and Option Tracing for Better Student Assessment [PDF] [Copy] [Kimi]

Authors: Suyeong An ; Junghoon Kim ; Minsam Kim ; Juneyoung Park

Student assessment is one of the most fundamental tasks in the field of AI Education (AIEd). One of the most common approach to student assessment is Knowledge Tracing (KT), which evaluates a student's knowledge state by predicting whether the student will answer a given question correctly or not. However, in the context of multiple choice (polytomous) questions, conventional KT approaches are limited in that they only consider the binary (dichotomous) correctness label (i.e., correct or incorrect), and disregard the specific option chosen by the student. Meanwhile, Option Tracing (OT) attempts to model a student by predicting which option they will choose for a given question, but overlooks the correctness information. In this paper, we propose Dichotomous-Polytomous Multi-Task Learning (DP-MTL), a multi-task learning framework that combines KT and OT for more precise student assessment. In particular, we show that the KT objective acts as a regularization term for OT in the DP-MTL framework, and propose an appropriate architecture for applying our method on top of existing deep learning-based KT models. We experimentally confirm that DP-MTL significantly improves both KT and OT performances, and also benefits downstream tasks such as Score Prediction (SP).

#3 Diaformer: Automatic Diagnosis via Symptoms Sequence Generation [PDF] [Copy] [Kimi]

Authors: Junying Chen ; Dongfang Li ; Qingcai Chen ; Wenxiu Zhou ; Xin Liu

Automatic diagnosis has attracted increasing attention but remains challenging due to multi-step reasoning. Recent works usually address it by reinforcement learning methods. However, these methods show low efficiency and require task-specific reward functions. Considering the conversation between doctor and patient allows doctors to probe for symptoms and make diagnoses, the diagnosis process can be naturally seen as the generation of a sequence including symptoms and diagnoses. Inspired by this, we reformulate automatic diagnosis as a symptoms Sequence Generation (SG) task and propose a simple but effective automatic Diagnosis model based on Transformer (Diaformer). We firstly design the symptom attention framework to learn the generation of symptom inquiry and the disease diagnosis. To alleviate the discrepancy between sequential generation and disorder of implicit symptoms, we further design three orderless training mechanisms. Experiments on three public datasets show that our model outperforms baselines on disease diagnosis by 1%, 6% and 11.5% with the highest training efficiency. Detailed analysis on symptom inquiry prediction demonstrates that the potential of applying symptoms sequence generation for automatic diagnosis.

#4 Zero-Shot Audio Source Separation through Query-Based Learning from Weakly-Labeled Data [PDF] [Copy] [Kimi]

Authors: Ke Chen ; Xingjian Du ; Bilei Zhu ; Zejun Ma ; Taylor Berg-Kirkpatrick ; Shlomo Dubnov

Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet. First, we propose a transformer-based sound event detection system for processing weakly-labeled training data. Second, we devise a query-based audio separation model that leverages this data for model training. Third, we design a latent embedding processor to encode queries that specify audio targets for separation, allowing for zero-shot generalization. Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training. In addition, the proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training. To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint AudioSet. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.

#5 DeepHardMark: Towards Watermarking Neural Network Hardware [PDF] [Copy] [Kimi]

Authors: Joseph Clements ; Yingjie Lao

This paper presents a framework for embedding watermarks into DNN hardware accelerators. Unlike previous works that have looked at protecting the algorithmic intellectual properties of deep learning systems, this work proposes a methodology for defending deep learning hardware. Our methodology embeds modifications into the hardware accelerator's functional blocks that can be revealed with the rightful owner's key DNN and corresponding key sample, verifying the legitimate owner. We propose an Lp-box ADMM based algorithm to co-optimize watermark's hardware overhead and impact on the design's algorithmic functionality. We evaluate the performance of the hardware watermarking scheme on popular image classifier models using various accelerator designs. Our results demonstrate that the proposed methodology effectively embeds watermarks while preserving the original functionality of the hardware architecture. Specifically, we can successfully embed watermarks into the deep learning hardware and reliably execute a ResNet ImageNet classifiers with an accuracy degradation of only 0.009%

#6 A Unified Framework for Real Time Motion Completion [PDF] [Copy] [Kimi]

Authors: Yinglin Duan ; Yue Lin ; Zhengxia Zou ; Yi Yuan ; Zhehui Qian ; Bohan Zhang

Motion completion, as a challenging and fundamental problem, is of great significance in film and game applications. For different motion completion application scenarios (in-betweening, in-filling, and blending), most previous methods deal with the completion problems with case-by-case methodology designs. In this work, we propose a simple but effective method to solve multiple motion completion problems under a unified framework and achieves a new state-of-the-art accuracy on LaFAN1 (+17% better than previous sota) under multiple evaluation settings. Inspired by the recent great success of self-attention-based transformer models, we consider the completion as a sequence-to-sequence prediction problem. Our method consists of three modules - a standard transformer encoder with self-attention that learns long-range dependencies of input motions, a trainable mixture embedding module that models temporal information and encodes different key-frame combinations in a unified form, and a new motion perceptual loss for better capturing high-frequency movements. Our method can predict multiple missing frames within a single forward propagation in real-time and get rid of the post-processing requirement. We also introduce a novel large-scale dance movement dataset for exploring the scaling capability of our method and its effectiveness in complex motion applications.

#7 FactorVAE: A Probabilistic Dynamic Factor Model Based on Variational Autoencoder for Predicting Cross-Sectional Stock Returns [PDF] [Copy] [Kimi]

Authors: Yitong Duan ; Lei Wang ; Qizhong Zhang ; Jian Li

As an asset pricing model in economics and finance, factor model has been widely used in quantitative investment. Towards building more effective factor models, recent years have witnessed the paradigm shift from linear models to more flexible nonlinear data-driven machine learning models. However, due to low signal-to-noise ratio of the financial data, it is quite challenging to learn effective factor models. In this paper, we propose a novel factor model, FactorVAE, as a probabilistic model with inherent randomness for noise modeling. Essentially, our model integrates the dynamic factor model (DFM) with the variational autoencoder (VAE) in machine learning, and we propose a prior-posterior learning method based on VAE, which can effectively guide the learning of model by approximating an optimal posterior factor model with future information. Particularly, considering that risk modeling is important for the noisy stock data, FactorVAE can estimate the variances from the distribution over the latent space of VAE, in addition to predicting returns. The experiments on the real stock market data demonstrate the effectiveness of FactorVAE, which outperforms various baseline methods.

#8 AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification [PDF] [Copy] [Kimi]

Authors: Ammarah Farooq ; Muhammad Awais ; Josef Kittler ; Syed Safwan Khalid

Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align cross-modality representations conforming to semantic information present for a person and ignore background information. This work presents a novel convolutional neural network (CNN) based architecture designed to learn semantically aligned cross-modal visual and textual representations. The underlying building block, named AXM-Block, is a unified multi-layer network that dynamically exploits the multi-scale knowledge from both modalities and re-calibrates each modality according to shared semantics. To complement the convolutional design, contextual attention is applied in the text branch to manipulate long-term dependencies. Moreover, we propose a unique design to enhance visual part-based feature coherence and locality information. Our framework is novel in its ability to implicitly learn aligned semantics between modalities during the feature learning stage. The unified feature learning effectively utilizes textual data as a super-annotation signal for visual representation learning and automatically rejects irrelevant information. The entire AXM-Net is trained end-to-end on CUHK-PEDES data. We report results on two tasks, person search and cross-modal Re-ID. The AXM-Net outperforms the current state-of-the-art (SOTA) methods and achieves 64.44% Rank@1 on the CUHK-PEDES test set. It also outperforms by >10% for cross-viewpoint text-to-image Re-ID scenarios on CrossRe-ID and CUHK-SYSU datasets.

#9 SCIR-Net: Structured Color Image Representation Based 3D Object Detection Network from Point Clouds [PDF] [Copy] [Kimi]

Authors: Qingdong He ; Hao Zeng ; Yi Zeng ; Yijun Liu

3D object detection from point clouds data has become an indispensable part in autonomous driving. Previous works for processing point clouds lie in either projection or voxelization. However, projection-based methods suffer from information loss while voxelization-based methods bring huge computation. In this paper, we propose to encode point clouds into structured color image representation (SCIR) and utilize 2D CNN to fulfill the 3D detection task. Specifically, we use the structured color image encoding module to convert the irregular 3D point clouds into a squared 2D tensor image, where each point corresponds to a spatial point in the 3D space. Furthermore, in order to fit for the Euclidean structure, we apply feature normalization to parameterize the 2D tensor image onto a regular dense color image. Then, we conduct repeated multi-scale fusion with different levels so as to augment the initial features and learn scale-aware feature representations for box prediction. Extensive experiments on KITTI benchmark, Waymo Open Dataset and more challenging nuScenes dataset show that our proposed method yields decent results and demonstrate the effectiveness of such representations for point clouds.

#10 Learning and Dynamical Models for Sub-seasonal Climate Forecasting: Comparison and Collaboration [PDF] [Copy] [Kimi]

Authors: Sijie He ; Xinyan Li ; Laurie Trenary ; Benjamin A Cash ; Timothy DelSole ; Arindam Banerjee

Sub-seasonal forecasting (SSF) is the prediction of key climate variables such as temperature and precipitation on the 2-week to 2-month time horizon. Skillful SSF would have substantial societal value in areas such as agricultural productivity, hydrology and water resource management, and emergency planning for extreme events such as droughts and wildfires. Despite its societal importance, SSF has stayed a challenging problem compared to both short-term weather forecasting and long-term seasonal forecasting. Recent studies have shown the potential of machine learning (ML) models to advance SSF. In this paper, for the first time, we perform a fine-grained comparison of a suite of modern ML models with start-of-the-art physics-based dynamical models from the Subseasonal Experiment (SubX) project for SSF in the western contiguous United States. Additionally, we explore mechanisms to enhance the ML models by using forecasts from dynamical models. Empirical results illustrate that, on average, ML models outperform dynamical models while the ML models tend to generate forecasts with conservative magnitude compared to the SubX models. Further, we illustrate that ML models make forecasting errors under extreme weather conditions, e.g., cold waves due to the polar vortex, highlighting the need for separate models for extreme events. Finally, we show that suitably incorporating dynamical model forecasts as inputs to ML models can substantially improve the forecasting performance of the ML models. The SSF dataset constructed for the work and code for the ML models are released along with the paper for the benefit of the artificial intelligence community.

#11 Solving PDE-Constrained Control Problems Using Operator Learning [PDF] [Copy] [Kimi]

Authors: Rakhoon Hwang ; Jae Yong Lee ; Jin Young Shin ; Hyung Ju Hwang

The modeling and control of complex physical systems are essential in real-world problems. We propose a novel framework that is generally applicable to solving PDE-constrained optimal control problems by introducing surrogate models for PDE solution operators with special regularizers. The procedure of the proposed framework is divided into two phases: solution operator learning for PDE constraints (Phase 1) and searching for optimal control (Phase 2). Once the surrogate model is trained in Phase 1, the optimal control can be inferred in Phase 2 without intensive computations. Our framework can be applied to both data-driven and data-free cases. We demonstrate the successful application of our method to various optimal control problems for different control variables with diverse PDE constraints from the Poisson equation to Burgers' equation.

#12 Proxy Learning of Visual Concepts of Fine Art Paintings from Styles through Language Models [PDF] [Copy] [Kimi]

Authors: Diana Kim ; Ahmed Elgammal ; Marian Mazzone

We present a machine learning system that can quantify fine art paintings with a set of visual elements and principles of art. The formal analysis is fundamental for understanding art, but developing such a system is challenging. Paintings have high visual complexities, but it is also difficult to collect enough training data with direct labels. To resolve these practical limitations, we introduce a novel mechanism, called proxy learning, which learns visual concepts in paintings through their general relation to styles. This framework does not require any visual annotation, but only uses style labels and a general relationship between visual concepts and style. In this paper, we propose a novel proxy model and reformulate four pre-existing methods in the context of proxy learning. Through quantitative and qualitative comparison, we evaluate these methods and compare their effectiveness in quantifying the artistic visual concepts, where the general relationship is estimated by language models; GloVe or BERT. The language modeling is a practical and scalable solution requiring no labeling, but it is inevitably imperfect. We demonstrate how the new proxy model is robust to the imperfection, while the other methods are sensitively affected by it.

#13 SPATE-GAN: Improved Generative Modeling of Dynamic Spatio-Temporal Patterns with an Autoregressive Embedding Loss [PDF] [Copy] [Kimi]

Authors: Konstantin Klemmer ; Tianlin Xu ; Beatrice Acciaio ; Daniel B. Neill

From ecology to atmospheric sciences, many academic disciplines deal with data characterized by intricate spatio-temporal complexities, the modeling of which often requires specialized approaches. Generative models of these data are of particular interest, as they enable a range of impactful downstream applications like simulation or creating synthetic training data. Recently, COT-GAN, a new GAN algorithm inspired by the theory of causal optimal transport (COT), was proposed in an attempt to improve generation of sequential data. However, the task of learning complex patterns over time and space requires additional knowledge of the specific data structures. In this study, we propose a novel loss objective combined with COT-GAN based on an autoregressive embedding to reinforce the learning of spatio-temporal dynamics. We devise SPATE (spatio-temporal association), a new metric measuring spatio-temporal autocorrelation. We compute SPATE for real and synthetic data samples and use it to compute an embedding loss that considers space-time interactions, nudging the GAN to learn outputs that are faithful to the observed dynamics. We test our new SPATE-GAN on a diverse set of spatio-temporal patterns: turbulent flows, log-Gaussian Cox processes and global weather data. We show that our novel embedding loss improves performance without any changes to the architecture of the GAN backbone, highlighting our model's increased capacity for capturing autoregressive structures.

#14 Intra-Inter Subject Self-Supervised Learning for Multivariate Cardiac Signals [PDF] [Copy] [Kimi]

Authors: Xiang Lan ; Dianwen Ng ; Shenda Hong ; Mengling Feng

Learning information-rich and generalizable representations effectively from unlabeled multivariate cardiac signals to identify abnormal heart rhythms (cardiac arrhythmias) is valuable in real-world clinical settings but often challenging due to its complex temporal dynamics. Cardiac arrhythmias can vary significantly in temporal patterns even for the same patient (i.e., intra subject difference). Meanwhile, the same type of cardiac arrhythmia can show different temporal patterns among different patients due to different cardiac structures (i.e., inter subject difference). In this paper, we address the challenges by proposing an Intra-Inter Subject Self-Supervised Learning (ISL) model that is customized for multivariate cardiac signals. Our proposed ISL model integrates medical knowledge into self-supervision to effectively learn from intra-inter subject differences. In intra subject self-supervision, ISL model first extracts heartbeat-level features from each subject using a channel-wise attentional CNN-RNN encoder. Then a stationarity test module is employed to capture the temporal dependencies between heartbeats. In inter subject self-supervision, we design a set of data augmentations according to the clinical characteristics of cardiac signals and perform contrastive learning among subjects to learn distinctive representations for various types of patients. Extensive experiments on three real-world datasets were conducted. In a semi-supervised transfer learning scenario, our pre-trained ISL model leads about 10% improvement over supervised training when only 1% labeled data is available, suggesting strong generalizability and robustness of the model.

#15 GeomGCL: Geometric Graph Contrastive Learning for Molecular Property Prediction [PDF] [Copy] [Kimi]

Authors: Shuangli Li ; Jingbo Zhou ; Tong Xu ; Dejing Dou ; Hui Xiong

Recently many efforts have been devoted to applying graph neural networks (GNNs) to molecular property prediction which is a fundamental task for computational drug and material discovery. One of major obstacles to hinder the successful prediction of molecular property by GNNs is the scarcity of labeled data. Though graph contrastive learning (GCL) methods have achieved extraordinary performance with insufficient labeled data, most focused on designing data augmentation schemes for general graphs. However, the fundamental property of a molecule could be altered with the augmentation method (like random perturbation) on molecular graphs. Whereas, the critical geometric information of molecules remains rarely explored under the current GNN and GCL architectures. To this end, we propose a novel graph contrastive learning method utilizing the geometry of the molecule across 2D and 3D views, which is named GeomGCL. Specifically, we first devise a dual-view geometric message passing network (GeomMPNN) to adaptively leverage the rich information of both 2D and 3D graphs of a molecule. The incorporation of geometric properties at different levels can greatly facilitate the molecular representation learning. Then a novel geometric graph contrastive scheme is designed to make both geometric views collaboratively supervise each other to improve the generalization ability of GeomMPNN. We evaluate GeomGCL on various downstream property prediction tasks via a finetune process. Experimental results on seven real-life molecular datasets demonstrate the effectiveness of our proposed GeomGCL against state-of-the-art baselines.

#16 OAM: An Option-Action Reinforcement Learning Framework for Universal Multi-Intersection Control [PDF] [Copy] [Kimi]

Authors: Enming Liang ; Zicheng Su ; Chilin Fang ; Renxin Zhong

Efficient traffic signal control is an important means to alleviate urban traffic congestion. Reinforcement learning (RL) has shown great potentials in devising optimal signal plans that can adapt to dynamic traffic congestion. However, several challenges still need to be overcome. Firstly, a paradigm of state, action, and reward design is needed, especially for an optimality-guaranteed reward function. Secondly, the generalization of the RL algorithms is hindered by the varied topologies and physical properties of intersections. Lastly, enhancing the cooperation between intersections is needed for large network applications. To address these issues, the Option-Action RL framework for universal Multi-intersection control (OAM) is proposed. Based on the well-known cell transmission model, we first define a lane-cell-level state to better model the traffic flow propagation. Based on this physical queuing dynamics, we propose a regularized delay as the reward to facilitate temporal credit assignment while maintaining the equivalence with minimizing the average travel time. We then recapitulate the phase actions as the constrained combinations of lane options and design a universal neural network structure to realize model generalization to any intersection with any phase definition. The multiple-intersection cooperation is then rigorously discussed using the potential game theory. We test the OAM algorithm under four networks with different settings, including a city-level scenario with 2,048 intersections using synthetic and real-world datasets. The results show that the OAM can outperform the state-of-the-art controllers in reducing the average travel time.

#17 End-to-End Line Drawing Vectorization [PDF] [Copy] [Kimi]

Authors: Hanyuan Liu ; Chengze Li ; Xueting Liu ; Tien-Tsin Wong

Vector graphics is broadly used in a variety of forms, such as illustrations, logos, posters, billboards, and printed ads. Despite its broad use, many artists still prefer to draw with pen and paper, which leads to a high demand of converting raster designs into the vector form. In particular, line drawing is a primary art and attracts many research efforts in automatically converting raster line drawings to vector form. However, the existing methods generally adopt a two-step approach, stroke segmentation and vectorization. Without vector guidance, the raster-based stroke segmentation frequently obtains unsatisfying segmentation results, such as over-grouped strokes and broken strokes. In this paper, we make an attempt in proposing an end-to-end vectorization method which directly generates vectorized stroke primitives from raster line drawing in one step. We propose a Transformer-based framework to perform stroke tracing like human does in an automatic stroke-by-stroke way with a novel stroke feature representation and multi-modal supervision to achieve vectorization with high quality and fidelity. Qualitative and quantitative evaluations show that our method achieves state of the art performance.

#18 Context-Aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs [PDF] [Copy] [Kimi]

Authors: Chang Lu ; Tian Han ; Yue Ning

With the wide application of electronic health records (EHR) in healthcare facilities, health event prediction with deep learning has gained more and more attention. A common feature of EHR data used for deep-learning-based predictions is historical diagnoses. Existing work mainly regards a diagnosis as an independent disease and does not consider clinical relations among diseases in a visit. Many machine learning approaches assume disease representations are static in different visits of a patient. However, in real practice, multiple diseases that are frequently diagnosed at the same time reflect hidden patterns that are conducive to prognosis. Moreover, the development of a disease is not static since some diseases can emerge or disappear and show various symptoms in different visits of a patient. To effectively utilize this combinational disease information and explore the dynamics of diseases, we propose a novel context-aware learning framework using transition functions on dynamic disease graphs. Specifically, we construct a global disease co-occurrence graph with multiple node properties for disease combinations. We design dynamic subgraphs for each patient's visit to leverage global and local contexts. We further define three diagnosis roles in each visit based on the variation of node properties to model disease transition processes. Experimental results on two real-world EHR datasets show that the proposed model outperforms state of the art in predicting health events.

#19 Hyperverlet: A Symplectic Hypersolver for Hamiltonian Systems [PDF] [Copy] [Kimi]

Authors: Frederik Baymler Mathiesen ; Bin Yang ; Jilin Hu

Hamiltonian systems represent an important class of dynamical systems such as pendulums, molecular dynamics, and cosmic systems. The choice of solvers is significant to the accuracy when simulating Hamiltonian systems, where symplectic solvers show great significance. Recent advances in neural network-based hypersolvers, though achieve competitive results, still lack the symplecity necessary for reliable simulations, especially over long time horizons. To alleviate this, we introduce Hyperverlet, a new hypersolver composing the traditional, symplectic velocity Verlet and symplectic neural network-based solvers. More specifically, we propose a parameterization of symplectic neural networks and prove that hyperbolic tangent is r-finite expanding the set of allowable activation functions for symplectic neural networks, improving the accuracy. Extensive experiments on a spring-mass and a pendulum system justify the design choices and suggest that Hyperverlet outperforms both traditional solvers and hypersolvers.

#20 Learning Human Driving Behaviors with Sequential Causal Imitation Learning [PDF] [Copy] [Kimi]

Authors: Kangrui Ruan ; Xuan Di

Learning human driving behaviors is an efficient approach for self-driving vehicles. Traditional Imitation Learning (IL) methods assume that the expert demonstrations follow Markov Decision Processes (MDPs). However, in reality, this assumption does not always hold true. Spurious correlation may exist through the paths of historical variables because of the existence of unobserved confounders. Accounting for the latent causal relationships from unobserved variables to outcomes, this paper proposes Sequential Causal Imitation Learning (SeqCIL) for imitating driver behaviors. We develop a sequential causal template that generalizes the default MDP settings to one with Unobserved Confounders (MDPUC-HD). Then we develop a sufficient graphical criterion to determine when ignoring causality leads to poor performances in MDPUC-HD. Through the framework of Adversarial Imitation Learning, we develop a procedure to imitate the expert policy by blocking π-backdoor paths at each time step. Our methods are evaluated on a synthetic dataset and a real-world highway driving dataset, both demonstrating that the proposed procedure significantly outperforms non-causal imitation learning methods.

#21 EMVLight: A Decentralized Reinforcement Learning Framework for Efficient Passage of Emergency Vehicles [PDF] [Copy] [Kimi]

Authors: Haoran Su ; Yaofeng Desmond Zhong ; Biswadip Dey ; Amit Chakraborty

Emergency vehicles (EMVs) play a crucial role in responding to time-critical events such as medical emergencies and fire outbreaks in an urban area. The less time EMVs spend traveling through the traffic, the more likely it would help save people's lives and reduce property loss. To reduce the travel time of EMVs, prior work has used route optimization based on historical traffic-flow data and traffic signal pre-emption based on the optimal route. However, traffic signal pre-emption dynamically changes the traffic flow which, in turn, modifies the optimal route of an EMV. In addition, traffic signal pre-emption practices usually lead to significant disturbances in traffic flow and subsequently increase the travel time for non-EMVs. In this paper, we propose EMVLight, a decentralized reinforcement learning (RL) framework for simultaneous dynamic routing and traffic signal control. EMVLight extends Dijkstra's algorithm to efficiently update the optimal route for the EMVs in real-time as it travels through the traffic network. The decentralized RL agents learn network-level cooperative traffic signal phase strategies that not only reduce EMV travel time but also reduce the average travel time of non-EMVs in the network. This benefit has been demonstrated through comprehensive experiments with synthetic and real-world maps. These experiments show that EMVLight outperforms benchmark transportation engineering techniques and existing RL-based signal control methods.

#22 Constrained Prescriptive Trees via Column Generation [PDF] [Copy] [Kimi]

Authors: Shivaram Subramanian ; Wei Sun ; Youssef Drissi ; Markus Ettl

With the abundance of available data, many enterprises seek to implement data-driven prescriptive analytics to help them make informed decisions. These prescriptive policies need to satisfy operational constraints, and proactively eliminate rule conflicts, both of which are ubiquitous in practice. It is also desirable for them to be simple and interpretable, so they can be easily verified and implemented. Existing approaches from the literature center around constructing variants of prescriptive decision trees to generate interpretable policies. However, none of the existing methods is able to handle constraints. In this paper, we propose a scalable method that solves the constrained prescriptive policy generation problem. We introduce a novel path-based mixed-integer program (MIP) formulation which identifies a (near) optimal policy efficiently via column generation. The policy generated can be represented as a multiway-split tree which is more interpretable and informative than binary-split trees due to its shorter rules. We demonstrate the efficacy of our method with extensive computational experiments on both synthetic and real datasets.

#23 DDGCN: Dual Dynamic Graph Convolutional Networks for Rumor Detection on Social Media [PDF] [Copy] [Kimi]

Authors: Mengzhu Sun ; Xi Zhang ; Jiaqi Zheng ; Guixiang Ma

Detecting rumors on social media has become particular important due to the rapid dissemination and adverse impacts on our lives. Though a set of rumor detection models have exploited the message propagation structural or temporal information, they seldom model them altogether to enjoy the best of both worlds. Moreover, the dynamics of knowledge information associated with the comments are not involved, either. To this end, we propose a novel Dual-Dynamic Graph Convolutional Networks, termed as DDGCN, which can model the dynamics of messages in propagation as well as the dynamics of the background knowledge from Knowledge graphs in one unified framework. Specifically, two Graph Convolutional Networks are adopted to capture the above two types of structure information at different time stages, which are then combined with a temporal fusing unit. This allows for learning the dynamic event representations in a more fine-grained manner, and incrementally aggregating them to capture the cascading effect for better rumor detection. Extensive experiments on two public real-world datasets demonstrate that our proposal yields significant improvements compared to strong baselines and can detect rumors at early stages.

#24 Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation [PDF] [Copy] [Kimi]

Authors: Qin Wang ; Jiayang Chen ; Yuzhe Zhou ; Yu Li ; Liangzhen Zheng ; Sheng Wang ; Zhen Li ; Shuguang Cui

Accurate protein contact map prediction (PCMP) is essential for precise protein structure estimation and further biological studies. Recent works achieve significant performance on this task with high quality multiple sequence alignment (MSA). However, the PCMP accuracy drops dramatically while only poor MSA (e.g., absolute MSA count less than 10) is available. Therefore, in this paper, we propose the Contact-Distil to improve the low homologous PCMP accuracy through knowledge distillation on a self-supervised model. Particularly, two pre-trained transformers are exploited to learn the high quality and low quality MSA representation in parallel for the teacher and student model correspondingly. Besides, the co-evolution information is further extracted from pure sequence through a pretrained ESM-1b model, which provides auxiliary knowledge to improve student performance. Extensive experiments show Contact-Distil outperforms previous state-of-the-arts by large margins on CAMEO-L dataset for low homologous PCMP, i.e., around 13.3% and 9.5% improvements against Alphafold2 and MSA Transformer respectively when MSA count less than 10.

#25 EtinyNet: Extremely Tiny Network for TinyML [PDF] [Copy] [Kimi]

Authors: Kunran Xu ; Yishi Li ; Huawei Zhang ; Rui Lai ; Lin Gu

There are many AI applications in high-income countries because their implementation depends on expensive GPU cards (~2000$) and reliable power supply (~200W). To deploy AI in resource-poor settings on cheaper (~20$) and low-power devices (<1W), key modifications are required to adapt neural networks for Tiny machine learning (TinyML). In this paper, for putting CNNs into storage limited devices, we developed efficient tiny models with only hundreds of KB parameters. Toward this end, we firstly design a parameter-efficient tiny architecture by introducing dense linear depthwise block. Then, a novel adaptive scale quantization (ASQ) method is proposed for further quantizing tiny models in aggressive low-bit while retaining the accuracy. With the optimized architecture and 4-bit ASQ, we present a family of ultralightweight networks, named EtinyNet, that achieves 57.0% ImageNet top-1 accuracy with an extremely tiny model size of 340KB. When deployed on an off-the-shelf commercial microcontroller for object detection tasks, EtinyNet achieves state-of-the-art 56.4% mAP on Pascal VOC. Furthermore, the experimental results on Xilinx compact FPGA indicate that EtinyNet achieves prominent low power of 620mW, about 5.6x lower than existing FPGA designs. The code and demo are in https://github.com/aztc/EtinyNet